Feature Reduction with Inconsistency
نویسندگان
چکیده
Feature selection is a classical problem in machine learning, and how to design a method to select the features that can contain all the internal semantic correlation of the original feature set is a challenge. The authors present a general approach to select features via rough set based reduction, which can keep the selected features with the same semantic correlation as the original feature set. A new concept named inconsistency is proposed, which can be used to calculate the positive region easily and quickly with only linear temporal complexity. Some properties of inconsistency are also given, such as the monotonicity of inconsistency and so forth. The authors also propose three inconsistency based attribute reduction generation algorithms with different search policies. Finally, a “mini-saturation” bias is presented to choose the proper reduction for further predictive designing. DOI: 10.4018/jcini.2010040106 78 International Journal of Cognitive Informatics and Natural Intelligence, 4(2), 77-87, April-June 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. There are two disadvantages for the above approaches. The method, which evaluates the feature set one by one, will destroy the internal semantic relation of the original feature set. And the evaluation for the whole subset of features will lead to low efficiency both in temporal complexity and spatial complexity. To overcome these two problems, many researchers introduce the rough set based reduction into feature selection (Hu, Zhao, Xie, & Yu, 2007; Jelonek, Krawiec, & Slowinski, 1995; Lin & Yin, 2004; Zhong, Dong, & Ohsuga, 2001; Swiniarski & Skowron, 2003). The reduction could preserve the semantic correlation of original features (Jensen & Shen, 2004). In this paper, we address the two weaknesses in traditional feature selection and introduce a new feature selection approach with rough set based reduction. We propose a new concept named inconsistency which is easy to calculate and can evaluate whether the attribute set is a reduct quickly. The rest of this paper is organized as follow: Section 2 presents the definitions and concepts related with inconsistency, some properties of inconsistency are also given in this section; Section 3 proposes three inconsistency based reduction algorithms with different search policies; Section 4 presents the “mini-saturation” bias based reduct selection policy to choose the “optimal” one from multiple reducts for further predictive modeling; and finally we conclude this paper in section 5. 1. RElAtEd dEFFInItIons And ConCEpts Some related definitions and concepts are presented as follow: Definition 1 Positive region, P and Q are two sets in the information system U(C, D), P Q C D , ⊆ ∪ , then the positive region of Q in P, denoted asPOS Q P ( ) , can be calculated as: POS Q PX P X U IND Q ( )
منابع مشابه
Consistency-based search in feature selection
Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an “optimal” subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset a...
متن کاملRough Set Approaches to Unsupervised Neural Network Based Pattern Classifier
Unsupervised neural network based pattern classification is a widely popular choice for many real time applications. Such applications always face challenges of processing data with lot of consistency, inconsistency, ambiguity or incompleteness. Hence to deal with such challenges a strong approximation tool is always needed. Rough set is one such tool and various approaches based on Rough set, ...
متن کاملHybrid Mammogram Classification Using Rough Set and Fuzzy Classifier
We propose a computer aided detection (CAD) system for the detection and classification of suspicious regions in mammographic images. This system combines a dimensionality reduction module (using principal component analysis), a feature extraction module (using independent component analysis), and a feature subset selection module (using rough set model). Rough set model is used to reduce the e...
متن کاملFeature reduction of hyperspectral images: Discriminant analysis and the first principal component
When the number of training samples is limited, feature reduction plays an important role in classification of hyperspectral images. In this paper, we propose a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter matrices. The proposed method, called DA-PC1, copes with the small sample size problem and has...
متن کاملA Modified Minimum Classification Error (MCE) Training Algorithm for Dimensionality Reduction
Dimensionality reduction is an important problem in pattern recognition. There is a tendency of using more and more features to improve the performance of classifiers. However, not all the newly added features are helpful to classification. Therefore it is necessary to reduce the dimensionality of feature space for effective and efficient pattern recognition. Two popular methods for dimensional...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCINI
دوره 4 شماره
صفحات -
تاریخ انتشار 2010